AITopics | great exploration

Collaborating Authors

great exploration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Neural Information Processing SystemsDec-23-2025, 18:13:45 GMT

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance. Some recent approaches to address these concerns have shown promise, but can still be overly optimistic in their expected outcomes. Theoretical work that provides strong guarantees on the performance of the output policy relies on a strong concentrability assumption, which makes it unsuitable for cases where the ratio between state-action distributions of behavior policy and some candidate policies is large. This is because, in the traditional analysis, the error bound scales up with this ratio. We show that using \emph{pessimistic value estimates} in the low-data regions in Bellman optimality and evaluation back-up can yield more adaptive and stronger guarantees when the concentrability assumption does not hold. In certain settings, they can find the approximately best policy within the state-action space explored by the batch data, without requiring a priori assumptions of concentrability. We highlight the necessity of our pessimistic update and the limitations of previous algorithms and analyses by illustrative MDP examples and demonstrate an empirical comparison of our algorithm and other state-of-the-art batch RL baselines in standard benchmarks.

good batch off-policy reinforcement learning, great exploration, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Provably Good Batch Reinforcement Learning Without Great Exploration

Neural Information Processing SystemsJan-21-2025, 13:05:24 GMT

Weaknesses: I also feel that the paper could have benefited from a discussion of these as compared to just outrightly saying that existing methods do not give us good results. In particular, the conditions under which existing methods work vs do not work should have been discussed more explicitly than what it is right now in the paper. Moreover, I think the experiments on cartpole and hopper are not indicative of their method's performance since these have determnisitc dynamics and the dataset was collected as trajectories (so s' is as frequent as s in the distribution \mu, see my point below) and hence their choice of masking reduces to action conditioned masking only. Some other questions that I have: - From the analysis perspective, the paper says that prior works such as Kumar et al. 2019 that use action conditional and concentrability do not get the same error rate. Is the main issue behind this limitation that the notion of concentrability used in Kumar et al. and other works is trajectory centric and not on the state-action marginal?

algorithm, concentrability, provably good batch reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Provably Good Batch Reinforcement Learning Without Great Exploration

Neural Information Processing SystemsJan-21-2025, 13:05:17 GMT

This is a nice paper, with a new idea and strong theoretical backing. The reviews, rebuttal and discussion periods led to a lot of detailed feedback, so I'd encourage the authors to include as much of this as possible in the camera-ready version, and specifically revise for clarity around the points that were unclear to the reviewers in the first submission.

great exploration, neurips paper, provably good batch reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Neural Information Processing SystemsOct-9-2024, 12:47:14 GMT

concentrability assumption, good batch off-policy reinforcement learning, great exploration, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback